{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "62881f0e",
   "metadata": {},
   "source": [
    "Copyright (c) 2021, salesforce.com, inc. \\\n",
    "All rights reserved. \\\n",
    "SPDX-License-Identifier: BSD-3-Clause \\\n",
    "For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80b714d6",
   "metadata": {},
   "source": [
    "**Try this notebook on [Colab](http://colab.research.google.com/github/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb)!**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59759a7f",
   "metadata": {},
   "source": [
    "## ⚠️ PLEASE NOTE:\n",
    "This notebook runs on a GPU runtime.\\\n",
    "If running on Colab, choose Runtime > Change runtime type from the menu, then select 'GPU' in the dropdown."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c10c310",
   "metadata": {},
   "source": [
    "# Welcome to WarpDrive!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "140cdcb0",
   "metadata": {},
   "source": [
    "This is the second tutorial on WarpDrive, a PyCUDA-based framework for extremely parallelized multi-agent reinforcement learning (RL) on a single graphics procesing unit (GPU). At this stage, we assume you have read our [first tutorial](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1-warp_drive_basics.ipynb) on WarpDrive basics.\n",
    "\n",
    "In this tutorial, we describe **CUDASampler**, a lightweight and fast action sampler based on the policy distribution across several RL agents and environment replicas. `CUDASampler` utilizes the GPU to parallelize operations to efficiently sample a large number of actions in parallel. \n",
    "\n",
    "Notably:\n",
    "\n",
    "1. It reads the distribution on the GPU through Pytorch and samples actions exclusively at the GPU. There is no data transfer. \n",
    "2. It maximizes parallelism down to the individual thread level, i.e., each agent at each environment has its own random seed and independent random sampling process. \n",
    "3. It runs much faster than most GPU samplers. For example, it is significantly faster than Pytorch."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fb986f4",
   "metadata": {},
   "source": [
    "## Dependencies"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1eca0557",
   "metadata": {},
   "source": [
    "You can install the warp_drive package using\n",
    "\n",
    "- the pip package manager, OR\n",
    "- by cloning the warp_drive package and installing the requirements.\n",
    "\n",
    "On Colab, we will do the latter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b47fd78a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "IN_COLAB = 'google.colab' in sys.modules\n",
    "\n",
    "if IN_COLAB:\n",
    "    ! git clone https://github.com/salesforce/warp-drive.git \n",
    "    % cd warp-drive\n",
    "    ! pip install -e .\n",
    "else:\n",
    "    ! pip install rl_warp_drive"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ebf8d6e",
   "metadata": {},
   "source": [
    "# Initialize CUDASampler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02990603",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import numpy as np\n",
    "from warp_drive.managers.function_manager import CUDAFunctionManager, CUDASampler\n",
    "from warp_drive.managers.data_manager import CUDADataManager\n",
    "from warp_drive.utils.constants import Constants\n",
    "from warp_drive.utils.data_feed import DataFeed\n",
    "from warp_drive.utils.common import get_project_root\n",
    "\n",
    "_CUBIN_FILEPATH = f\"{get_project_root()}/warp_drive/cuda_bin\"\n",
    "_ACTIONS = Constants.ACTIONS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7d9122e",
   "metadata": {},
   "source": [
    "We first initialize the **CUDADataManager** and **CUDAFunctionManager**. To illustrate the sampler, we first load a pre-compiled binary file called \"test_build.cubin\". This binary is compiled with inclusion of auxiliary files in `warp_drive/cuda_includes/core` which includes several CUDA core services provided by WarpDrive. These include the backend source code for `CUDASampleController`. \n",
    "\n",
    "To make \"test_build.fatbin\" available, you need go to `warp_drive/cuda_includes` to compile this test cubin by calling \n",
    "`make compile-test`\n",
    "\n",
    "For this notebook demonstration, in the bin folder, we already provide a pre-compiled binary.\n",
    "\n",
    "Finally, we initialize **CUDASampler** and assign the random seed. `CUDASampler` keeps independent randomness across all threads and blocks. Notice that `CUDASampler` requires `CUDAFunctionManager` because `CUDAFunctionManager` manages all the CUDA function pointers including to the sampler. Also notice this test binary uses 2 environment replicas and 5 agents. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba181381",
   "metadata": {},
   "outputs": [],
   "source": [
    "cuda_data_manager = CUDADataManager(num_agents=5, episode_length=10, num_envs=2)\n",
    "cuda_function_manager = CUDAFunctionManager(num_agents=cuda_data_manager.meta_info(\"n_agents\"),\n",
    "                                            num_envs=cuda_data_manager.meta_info(\"n_envs\"))\n",
    "cuda_function_manager.load_cuda_from_binary_file(f\"{_CUBIN_FILEPATH}/test_build.fatbin\", default_functions_included=True)\n",
    "cuda_sampler = CUDASampler(function_manager=cuda_function_manager)\n",
    "cuda_sampler.init_random(seed=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76395e1d",
   "metadata": {},
   "source": [
    "# Sampling"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1899d236",
   "metadata": {},
   "source": [
    "## Actions Placeholder"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d817f15",
   "metadata": {},
   "source": [
    "Now, we feed the **actions_a** placeholder into the GPU. It has the shape `(n_envs=2, n_agents=5)` as expected. Also we make it accessible by Pytorch, because during RL training, actions will be fed into the Pytorch trainer directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cce5aa1",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_feed = DataFeed()\n",
    "data_feed.add_data(name=f\"{_ACTIONS}_a\", data=[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])\n",
    "cuda_data_manager.push_data_to_device(data_feed, torch_accessible=True)\n",
    "assert cuda_data_manager.is_data_on_device_via_torch(f\"{_ACTIONS}_a\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb34d045",
   "metadata": {},
   "source": [
    "## Action Sampled Distribution"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b805d7ce",
   "metadata": {},
   "source": [
    "We define an action **distribution** here. During training, this distribution would be provided by the policy model implemented in Pytorch. The distribution has the shape `(n_envs, n_agents, **n_actions**)`. The last dimension `n_actions` defines the size of the action space for a particular *discrete* action. For example, if we have up, down, left, right and no-ops, `n_actions=5`.\n",
    "\n",
    "**n_actions** needs to be registered by the sampler so the sampler is able to pre-allocate a global memory space in GPU to speed up action sampling. This can be done by calling `sampler.register_actions()`.\n",
    "\n",
    "In this tutorial, we check if our sampled action distribution follows the given distribution. For example, the distribution [0.333, 0.333, 0.333] below suggests the 1st agent has 3 possible actions and each of them have equal probability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "950a92fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "cuda_sampler.register_actions(cuda_data_manager, action_name=f\"{_ACTIONS}_a\", num_actions=3)\n",
    "\n",
    "distribution = np.array([[[0.333, 0.333, 0.333], [0.2, 0.5, 0.3], [0.95, 0.02, 0.03], [0.02, 0.95, 0.03], [0.02, 0.03, 0.95]],\n",
    "                         [[0.1, 0.7, 0.2], [0.7, 0.2, 0.1], [0.5, 0.5, 0.0], [0.0, 0.5, 0.5], [0.5, 0.0, 0.5]]])\n",
    "distribution = torch.from_numpy(distribution).float().cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b9edf37",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run 10000 times to collect statistics\n",
    "actions_batch = torch.from_numpy(np.empty((10000, 2, 5), dtype=np.int32)).cuda()\n",
    "\n",
    "for i in range(10000):\n",
    "    cuda_sampler.sample(cuda_data_manager,\n",
    "                        distribution,\n",
    "                        action_name=f\"{_ACTIONS}_a\")\n",
    "    actions_batch[i] = cuda_data_manager.data_on_device_via_torch(f\"{_ACTIONS}_a\")\n",
    "actions_batch_host = actions_batch.cpu().numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0aba16b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "actions_env_0 = actions_batch_host[:, 0]\n",
    "actions_env_1 = actions_batch_host[:, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9eb5ceda",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Sampled actions distribution versus the given distribution (in bracket) for env 0: \\n\")\n",
    "for agent_id in range(5): \n",
    "    print(f\"Sampled action distribution for agent_id: {agent_id}:\\n\"\n",
    "          f\"{(actions_env_0[:, agent_id] == 0).sum() / 10000.0}({distribution[0, agent_id, 0]}), \\n\"    \n",
    "          f\"{(actions_env_0[:, agent_id] == 1).sum() / 10000.0}({distribution[0, agent_id, 1]}), \\n\"\n",
    "          f\"{(actions_env_0[:, agent_id] == 2).sum() / 10000.0}({distribution[0, agent_id, 2]})  \\n\")   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "afdee114",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Sampled actions distribution versus the given distribution (in bracket) for env 1: \")\n",
    "\n",
    "for agent_id in range(5): \n",
    "    print(f\"Sampled action distribution for agent_id: {agent_id}:\\n\"\n",
    "          f\"{(actions_env_1[:, agent_id] == 0).sum() / 10000.0}({distribution[1, agent_id, 0]}), \\n\"    \n",
    "          f\"{(actions_env_1[:, agent_id] == 1).sum() / 10000.0}({distribution[1, agent_id, 1]}), \\n\"\n",
    "          f\"{(actions_env_1[:, agent_id] == 2).sum() / 10000.0}({distribution[1, agent_id, 2]})  \\n\")     "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5da9328f",
   "metadata": {},
   "source": [
    "## Action Randomness Across Threads"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2329d14",
   "metadata": {},
   "source": [
    "Another important validation is whether the sampler provides independent randomness across different agents and environment replicas. Given the same policy model for all the agents and environment replicas, we can check if the sampled actions are independently distributed. \n",
    "\n",
    "Here, we assign all agents across all envs the same distribution [0.25, 0.25, 0.25, 0.25]. It is equivalent to an uniform action distribution among all actions [0,1,2,3], across 5 agents and 2 envs. Then we check the standard deviation across the agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42e5e607",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_feed = DataFeed()\n",
    "data_feed.add_data(name=f\"{_ACTIONS}_b\", data=[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])\n",
    "cuda_data_manager.push_data_to_device(data_feed, torch_accessible=True)\n",
    "assert cuda_data_manager.is_data_on_device_via_torch(f\"{_ACTIONS}_b\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "909a069f",
   "metadata": {},
   "outputs": [],
   "source": [
    "cuda_sampler.register_actions(cuda_data_manager, action_name=f\"{_ACTIONS}_b\", num_actions=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c3861b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "distribution = np.array([[[0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25]],\n",
    "                         [[0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25]]])\n",
    "distribution = torch.from_numpy(distribution).float().cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "256ad0de",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run 10000 times to collect statistics.\n",
    "actions_batch = torch.from_numpy(np.empty((10000, 2, 5), dtype=np.int32)).cuda()\n",
    "\n",
    "for i in range(10000):\n",
    "    cuda_sampler.sample(cuda_data_manager,\n",
    "                        distribution,\n",
    "                        action_name=f\"{_ACTIONS}_b\")\n",
    "    actions_batch[i] = cuda_data_manager.data_on_device_via_torch(f\"{_ACTIONS}_b\")\n",
    "actions_batch_host = actions_batch.cpu().numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e0f1d7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "actions_batch_host"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68ebe1b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "actions_batch_host.std(axis=2).mean(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bce0a401",
   "metadata": {},
   "source": [
    "To check the independence of randomness among all threads, we can compare it with a Numpy implementation. Here we use `numpy.choice(4, 5)` to repeat the same process for an uniform action distribution among all actions [0,1,2,3], 5 agents and 2 envs. We should see that the variation of Numpy output is very close to our sampler."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f314d3a",
   "metadata": {},
   "outputs": [],
   "source": [
    "actions_batch_numpy = np.empty((10000, 2, 5), dtype=np.int32)\n",
    "for i in range(10000):\n",
    "    actions_batch_numpy[i,0,:] = np.random.choice(4, 5)\n",
    "    actions_batch_numpy[i,1,:] = np.random.choice(4, 5)\n",
    "actions_batch_numpy.std(axis=2).mean(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a47d6b2a",
   "metadata": {},
   "source": [
    "## Running Speed"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cc356e5",
   "metadata": {},
   "source": [
    "The total time for sampling includes receiving a new distribution and using this to sample.\n",
    "Comparing our sampler with [torch.Categorical sampler](https://pytorch.org/docs/stable/distributions.html), \n",
    "we reach **7-8X** speed up for the distribution above. \n",
    "\n",
    "*Note: our sampler runs in parallel across threads, so this speed-up is almost constant when scaling up the number of agents or environment replicas, i.e., increasing the number of used threads.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c31e9a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.distributions import Categorical"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6dc6be7",
   "metadata": {},
   "outputs": [],
   "source": [
    "distribution = np.array([[[0.333, 0.333, 0.333], [0.2, 0.5, 0.3], [0.95, 0.02, 0.03], [0.02, 0.95, 0.03], [0.02, 0.03, 0.95]],\n",
    "                         [[0.1, 0.7, 0.2], [0.7, 0.2, 0.1], [0.5, 0.5, 0.0], [0.0, 0.5, 0.5], [0.5, 0.0, 0.5]]])\n",
    "distribution = torch.from_numpy(distribution).float().cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c3a26ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "start_event = torch.cuda.Event(enable_timing=True)\n",
    "end_event = torch.cuda.Event(enable_timing=True)\n",
    "\n",
    "start_event.record()\n",
    "for _ in range(1000):\n",
    "    cuda_sampler.sample(cuda_data_manager, distribution, action_name=f\"{_ACTIONS}_a\")\n",
    "end_event.record()\n",
    "torch.cuda.synchronize()\n",
    "print(f\"time elapsed: {start_event.elapsed_time(end_event)} ms\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2ac9745",
   "metadata": {},
   "outputs": [],
   "source": [
    "start_event = torch.cuda.Event(enable_timing=True)\n",
    "end_event = torch.cuda.Event(enable_timing=True)\n",
    "\n",
    "start_event.record()\n",
    "for _ in range(1000):\n",
    "    Categorical(distribution).sample()\n",
    "end_event.record()\n",
    "torch.cuda.synchronize()\n",
    "print(f\"time elapsed: {start_event.elapsed_time(end_event)} ms\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60aea6b8",
   "metadata": {},
   "source": [
    "# Learn More and Explore our Tutorials!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a509e7d",
   "metadata": {},
   "source": [
    "Next, we suggest you check out our advanced [tutorial](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb) on WarpDrive's reset and log controller!\n",
    "\n",
    "For your reference, all our tutorials are here:\n",
    "- [A simple end-to-end RL training example](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/simple-end-to-end-example.ipynb)\n",
    "- [WarpDrive basics](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-1-warp_drive_basics.ipynb)\n",
    "- [WarpDrive sampler](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-2-warp_drive_sampler.ipynb)\n",
    "- [WarpDrive reset and log](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-3-warp_drive_reset_and_log.ipynb)\n",
    "- [Creating custom environments](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-4-create_custom_environments.ipynb)\n",
    "- [Training with WarpDrive](https://www.github.com/salesforce/warp-drive/blob/master/tutorials/tutorial-5-training_with_warp_drive.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
